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Constrained General Side-Information 
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Abstract 

We investigate the Wyner-Ziv coding in which the statistics of the principal source is known but the statistics 
of the channel generating the side-information is unknown except that it is in a certain class. The class consists of 
channels such that the distortion between the principal source and the side-information is smaller than a threshold, 
but channels may be neither stationary nor ergodic. In this situation, we define a new rate-distortion function as the 
minimum rate such that there exists a Wyner-Ziv code that is universal for every channel in the class. Then, we 
show an upper bound and a lower bound on the rate-distortion function, and derive a matching condition such that 
the upper and lower bounds coincide. The relation between the new rate-distortion function and the rate-distortion 
function of the Heegard-Berger problem is also discussed. 

Index Terms 

Average Distortion, Heegard-Berger Problem, Maximum Distortion, Universal Coding, Wyner-Ziv Problem 

I. Introduction 

In the seminal paper 0], Wyner and Ziv characterized the rate-distortion function of the lossy source coding with 
side-information at the decoder (See Fig. [TJ. In this paper, we consider a universal coding of this problem where 
the statistics of the principal source is known but the channel from the principal source to the side-information is 
unknown except that it is in a certain class. 

To motivate the problem setting investigated in this paper, let us consider the following practical situation first. 
Suppose that the decoder already has a lossy compressed version of the principal source, and want to get a refined 
one. The encoder does not know how the previously transmitted lossy version is encoded, but knows that the quality 
of the lossy version is guaranteed to be above a certain level. What is the minimum additional rate that must be 
transmitted by the encoder so that the quality of the refined version is above a required level? 

The first author is with the Department of Information Science and Intelligent Systems, University of Tokushima, 2-1, Minami-josanjima, 
Tokushima, 770-8506, Japan, e-mail:shun-wata@is. tokushima-u.ac.jp. 

The second author is with the Department of Computer and Communication Sciences, Wakayama University, Wakayama, 640-8510, Japan, 
e-mail : kuzuoka® ieee . org . 

Manuscript received ; revised 



February 4, 2013 



DRAFT 



JOURNAL OF LSTpX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 



2 



The above mentioned situation can be modeled as follows. The principal source X n is a known i.i.d. source, 
and the side-information Y n is generated from X n through a channel W n . The statistical property of the channel 
is unknown, but the distortion caused by the channel is smaller than a certain level E for a prescribed distortion 
measure. We assume that the distortion measure is additive, but the channel may be neither stationary nor ergodic. 
We consider the maximum distortion constraint and the average distortion constraint for the channel. Since we 
allow non-ergodic channel, the class of channels constrained by the maximum distortion and that constrained by 
the average distortion are different. In this problem formulation, we are interested in the minimum rate R m (D\E) 
and R a (D\E) such that the reproduction with distortion level D is possible at the decoder for any channel in 
the classes of channels satisfying the distortion level E with the maximum distortion constraint and the average 
distortion constrain respectively. In other word, we are interested in the minimum rate such that the universal coding 
is possible for each class. 

For the maximum distortion constrained class, we show an upper bound and a lower bound on R m (D\E). We also 
derive a matching condition such that the upper and the lower bounds coincide. Especially, for the binary Hamming 
example, we show that the matching condition is satisfied, and thus R m (D\E) is completely characterized. 

For the average distortion constrained class, we show an upper bound and a lower bound on R a (D\E). For the 
case with D = 0, i.e., the loss less reproduction case, we show that the upper and lower bounds coincide and 
thus R a (0\E) is completely characterized. Surprisingly, R a (0\E) = H(X), i.e., the side-information is completely 
useless, for any E > 0. 

Some remarks on related literatures are in order. 

For lossless source coding with side-information, i.e., the Slepian-Wolf network Q, the existence of universal 
code was first shown by Csiszar and Korner (existence of linear universal code was also shown by Csiszar J4]). 
After that, the universal codings for the Slepian-Wolf network or other related lossless multi-terminal networks 
were studied by several researchers |]5], 0, [0. 

For lossy source coding with side-information, i.e., the Wyner-Ziv network, the universal coding problem was 
investigated by Merhav and Ziv JS), Jalali et. al. 0, and Reani and Merhav fTOl . It should be noted that the 
universal codes proposed in these literatures are universal for the statistics of the principal source but not for the 
channel generating the side-information, i.e., the statistics of the channel is known at the encoder. Under the same 
condition, i.e., known channel, it is also known that the universal code can be constructed for the network with 
several decoders ifTTl . 

The universal Wyner-Ziv coding is also related to the Heeger-Berger problem IflZl . in which there are several 
decoders that have their own side-information. The Heeger-Berger problem has not been solved in general, and it 
has only been solved under the condition that there is a degraded partial order between the channels generating the 
side-information [fl~3], lfl4l . |[T5l except some special cases |[l6l . lUTl . It should be noted that there is no degraded 
partial order among the channel class considered in this paper. Thus, the authors believe that the result in this paper 
also shed some light on the unsolved Heeger-Berger problem. 

Our problem setting can be also viewed as a kind of the successive refinement coding lfl"8l . lfl9l . The successive 
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Fig. 1. The Wyner-Ziv coding system. 



R 




refinement coding consists of two layers of the encodings. If the method used by the first layer encoder is not 
known to the second layer encoder, this is exactly the situation of our problem setting. 

Although the universal coding for distortion constrained class of channels is unfamiliar and new in the source 
coding scenario, this kind of channel is quite natural when the channel is cased by an adversary such as in the data 
hiding scenario. Indeed, this kind of channel class is commonly used in the information theoretical analysis of the 
data hiding |20l. l2ll. |22l. 

There are some technical differences between the data hiding problem and our problem. First, in the data hiding 
problem, the channel output is only used for the decoding of the encoded message. On the other hand, in our 
problem, the side-information is not only used for the decoding of the encoded source, but also for the estimation 
at the decoder. This makes the problem difficult, and causes a gap between the upper bound and the lower bound 
derived in this paper. Second, in the data hiding problem for the average distortion constrained class of channels, 
it was shown that the achievable transmission rate is 0, i.e., the channel is completely useless Ell . On the other 
hand, in our problem for the average distortion constrained class of channels, the side-information is useless for 
bin coding, but it can be used for the estimation at the decoder. Thus, R a (D\E) can be strictly smaller than the 
rate-distortion function R(D) without any side-information for D > 0, though R a (0\E) = H(X). 

The rest of this paper is organized as follows. In Section |TTJ we introduce notations and the formal definition 
of the problem. In Section [TTU we state our main theorems, and show a representative example, i.e., the binary 
Hamming example. In Sections HVl and [VI we present proofs of the main theorems. 

II. Preliminaries 

A. Notations 

Henceforth, we adopt the following notation conventions. Random variables will be denoted by capital letters 
such as X, while their realizations will be denoted by respective lower case letters such as x. A random vector of 
length 7i is denoted by X n = (Xi, . . . , X n ), while its realization is denoted by x n = (xi, . . . , x n ). The alphabet of 
a random variable is denoted by a calligraphic letter such as X, and its n-fold Cartesian product is denoted by X n . 
The probability distribution of random variable X is denoted by Px, and its n-fold i.i.d. extension is denoted by 
P x . For a given channel W, its n-fold i.i.d. extension is denoted by W xn , while W n indicates a channel that is not 
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necessarily i.i.d.. The set of all probability distribution on X is denoted by V(X). The set of all channel from X 
to y is denoted by P(y\X). The indicator function is denoted by The entropy and the mutual information is 
denoted in a standard notation such as H(X) or I(X; Y). For a input distribution P of a channel W, we sometimes 
use the notation I(P,W) to designate the mutual information I(X;Y), where the joint distribution of (X, Y) is 
P(x)W(y\x). The variational distance between two distributions P and Q is denoted by ||P — Q\\, In the proofs 
of our main theorems, we extensively use the type and typicality, which are summarized in Appendix lAl 

B. Problem Formulation 

Let X = {X n )n=i be an i.i.d. source. Let 

1 ™ 

e n (x n ,y n ) := - YVx t ,y t ) 
n z — ' 
t=i 

be an additive distortion measure for side information. As a natural assumption, we assume that there exists y such 
that e(x, y) = for each x. We also assume that the distortion is bounded, i.e., e(x, y) < e max < oo for every (x, y). 
For a given distortion E > 0, we consider the following maximum distortion constraint on the side-information 

W m (E) := {W = {W n }™ =1 : V<5 > 3n (S) s.t. 

Pv{e n (X n ,Y n )>E}<6Wn>n (S)}, (1) 

where Y n is the output of channel W n with input X n . It should be noted that no(S) depends on 5 but not on W. 
We also consider the average distortion constraint 

Wa(E) 

:= {W = {W n }%L 1 :e n (P X n,W n ) < E Vn > 1} (2) 

where 

e n (P Xn ,W n ) := E[e n {X n ,Y n )} 

= Pxtx n )W n (y n \x n )e n (x n ,y n ). 

As it will be clarified later, the maximum distortion constraint and the average distortion constraint are completely 
different. 

Let X be the reproduction alphabet. Then, let 

1 - 

d n (x n ,x n ) := - Y^d(x u x t ) (3) 
t=i 

be an additive distortion measure for reproduction. We assume d(x, x) < <i max < cxd for every (x,x). 
We consider (possibly stochastic) encoder 

<p n : X n -4- M n 
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and decoder 

Definition 1: For any e > 0, if there exists 710(e) and a sequence of codes {(</?„, V 1 ™)}^! sucn mat 

- log |7W„| < + £ (4) 

71 

and 

E[d n (X n , tp n (tp n (X n ),Y n ))} <D + s (5) 

for every W G W m (£') and n > 7i (e), then we define the rate R to be achievable. We also define the rate 
distortion function 

Rm(D\E) := mf{i? : i? is achievable}. 

We also define R a (D\E) by replacing W m (E) with W a (E). 

Remark 2: As we can find from the proof of Theorem [5] the theorem holds even if the average distortion 
requirement in ((5} is replaced by the maximum distortion requirement 

PY{d n (X n ,M4>n(X n ),Y n )) >D + e}< £ . (6) 

However, Theorem [8] does not hold if (0 is replaced by (|6). 

Let Rwz(D\W) be the rate distortion function of the ordinary Wyner-Ziv problem in which the principal source 
is X and the side-information Y is the output of the channel W € V(y\X). 

The rate distortion function R m (D\E) (or R a (D\E)) means that if R > R m (D\E) there exists a universal code 
that works well for every W G W m (E) (or W E W a (E)). It should be noted that this definition of universality 
is different from the ordinary definition of the universality. Let 

W W z(R, D) := {W G V(y\X) : R WZ (D\W) < R} . (7) 

In the ordinary definition of the universality, we require that there exists a code that works well for every W G 
Wwz(R, D). This requirement seems much severe than the requirement of R m (D\E) (or R a (D\E)), which will 
be discussed in more detail in Section UTTl 

C. Heegard-Berger Problem 

For later use, we review the problem formulation of the Heegard-Berger (HB) problem lfT2l in this section. We 
restrict our attention to the case with two decoders (see Fig. |2j. Furthermore, we restrict our attention to the case 
such that the alphabets of the side-information, the reproduction alphabets, and the distortion measures for both the 
decoders are common, which are denoted by y, X, and d(-, •) respectively. 

Let us consider the HB coding for i.i.d. joint source (X, Yi, Y^)- The HB code consists of one encoder 

^ B : X n -> M n 
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and two decoders 

^ m : M n x y n -+ X n , 

For a pair (Di,D 2 ) of distortions, a rate i? is defined to be (D\, D2)-achievable if, for any e > 0, there exists a 
sequence of HB code {(<^f B , Vn B1 > B2 )}n=i such that 

-log|X„| < R + e, 
n 



E 



< Di + e t = l,2, 



for sufficiently large n, where X™ = ip^ Bl (ip n (X n ),Y l n ) and d n is defined in (|3). Then, the HB rate-distortion 
function Rhb(Di, D 2 \X, Yi, ^2) for (Jf, Yi, Yj) is defined as the infimum of (Di, D2) -achievable rate R. 

Fix an i.i.d. source P X - Then two side-information channel W\ : X ^ y and W2 ■ X — > y define an i.i.d. joint 
source (X,Yi,Y 2 ) whose joint distribution P X y^y 2 is given by P XYi y 2 ( x , y\, 2/2) = Px(x)Wi(yi\x)W 2 (y 2 \x), 
where x £ X and 2/1,2/2 £ 3^ In the following, we denote by Rhb(Di, D2\Wi, W2) the HB rate-distortion 
function R HB (D 1 , D 2 \X,Y 1 ,Y 2 ) for (X,Yi,Y 2 ) defined by Wi and W 2 . 

Unfortunately, finding a single-letter expression for Rhb(Di, D 2 \Wi, W2) has been a long-standing open prob- 
lem. So, we consider a special case. Let 

E* := min V" P x (x)e(x,y) 
y&y ~l 

and y t £ J 1 be a symbol which attains the minimum. Further, let W* : X — > y be a side-information channel 
such that W*(y*\x) = 1 irrespective x £ X. Then, let us consider a special case where W\ — W*. This case is 
equivalent to the problem of "lossy coding when side-information may be absent". Heegard and Berger IPT21 (see 
also [ 25 1 ) showed the following. 
Proposition 3 (UTti): We have 

R H b(D u D 2 \W*,W 2 ) = min [l(X; ±1) + I(X; V\X U Y) 

where min is taken over all conditional distribution P VXl \x w ' tn 1^1 — \XxX\+2 and functions / :VxXxy — >• X 
such that 

E[d(X,Xi)] < Di, 
E[d(A-,/(y,Xi,y))] < Da. 



III. Main Result 

A. Convex Form of WZ Rate-Distortion Function 

We need convex form of the Wyner-Ziv rate-distortion function introduced in 11261 . Let U be the set of all 
functions from y to X. The set U includes a constant function, i.e., u(y) = x My £ y for each x £ X. We denote 
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R 





Fig. 2. The Heegard-Berger coding system. 

the set of constant functions by U C U. For fixed channel W G V(y\X) and fixed test channel V G V{U\X), we 
denote 

d(V,W) := Px{x)V{u\x)W{y\x)d{x,u{y)). 

u,x,y 

For a fixed channel W G V(y\X), let 

V(W, £>) := {V G V(U\X) : d(V, W) < D} . 



Let 



Wx(P x ,E) = Wi(E) 

:= {W G V(y\X) : e(P x , < E) 



and 



V(-B,£») := {V G : d(V,W) < D \/W G Wi(£)} 

For (V, VK) G P(W|Af) x V{y\X), let 

0(y,W) := I(U;X)-I(U;Y) 
= I(U;X\Y). 



(8) 
(9) 



Note that (/>(•, W) is a convex function for fixed W, which can be confirmed from ©, and <p(V, •) is a concave 
function for fixed V, which can be confirmed from (|8j. 
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By the above notations, the Wyner-Ziv rate-distortion function is given by 



Rwz(D\W)= min MV,W). 
vev(w,D) 



Let 



Rwz(D\W,E) = min <p(V,W) 

VeV{E,D) 

be the pseudo rate-distortion function. 

Lemma 4: The pseudo rate-distortion function Rwz(D\W, E) is a concave function of the channel, i.e. 

R WZ (D\XW 1 + (1-X)W 2 ,E) 

> \Rwz(D\W 1 ,E) + (l-\)R W z(D\W 2 ,E) 

holds for Wi, W 2 G P^l^) and < A < 1. 
Proof: Let 



V = argmin 0(V, AWi + (1 - A)W 2 ). 
yev(B,D) 



Then, we have 



R WZ {D\\W 1 + (1 - A)W 2 ,£) 
= 0(t>,AWi + (l-A)W 2 ) 

> X<j){V,Wi) + {l-X)(j){V,W 2 ) 

> A min 0(V, Wi) + (1 - A) min MV,W 2 ) 

VGV(E,D) V£V(E,D) 

= XR wz (D\Wx) + (l-X)Rwz(D\W 2 ), 
where we used concavity of <f>(V, •) for fixed V in the first inequality. 

B. Statements of General Results 

For the maximum distortion class, we have the following. 
Theorem 5: We have 



and 



R m (D\E) > max R WZ (D\W) (10) 
weWi(E) 

max min 6(V, W) (11) 
wew^E) vev(w,D) 



R m (D\E) < min max MV,W) (12) 
vgv(e,d) weWi(B) 

= max min cj)(V,W). (13) 

WGWi(E) V€V(E,D) 

= max # W z(.D|W,£0 (14) 

WeWi(E) 
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Proof: See Section HVl ■ 
The difference between (fTTb and (fT3l l are V(W,D) and V(E,D). Thus, we have the following matching 
conditions. 

Corollary 6: Let (V*,W*) be a saddle point satisfying 



Suppose that 



Then, we have 



Proof: We have 



(j)(V*,W*)= max min MV,W). 

WSWi(E) V&V(E,D) 



V := argmin £ V(fi,fl). 



i2m(D|S) = ^(7*,^*) = max min 0(V,WO. 



«!>(V,W*) = min MV,W*) 
veV(w*,D) 

< max min 4>(V. W) 
iyeWi(B) vev(w,D) 

< R m (D\E) 

< 4>{v*,w*) 

min MV,W*) 

VeV(E,D) 

< <Kv,w). 



Corollary 7: Under the same notations as Corollary |6] suppose that 

supp(y) c U. (15) 

Then, we have 

R m (D\E) = 6(V*,W*) = max min MV,W). 

WeWi{E)V£V(E,D) 



Proof: When ( fT3T > is satisfied, the distortion 

Px(x)V(u\x)W(y\x)d(x,u(y)) 



u,x,y 

does not depend on the channel W. Thus, Corollary [6] implies the statement of the present corollary ■ 
For the average distortion class, we have the following. 
Theorem 8: We have 

R a {D\E)> max min R HB {D ll D 2 \W 1 ,W 2 ), (16) 

X,E 1 ,E 2 ,W 1 ,W 2 : D ll D 2 : 
AE 1 + (1-A)E 2 <E JB 1 + (1-J)D 2 <D 

W 1 €W 1 (E 1 ),W 2 €W 1 (E 2 ) 
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where (i) max is taken over all < A < 1, Ej > 0, and side information channels W\, W 2 such that XE\ + 
(1 - \)E 2 < E and Wj £ Wi(Ej) (j = 1,2) and (ii) min is taken over all Di,D 2 G [0,d max ] such that 
XDi + (1 - X)D 2 < D. Especially, 

R a (D\E) > max min R H b(Di, D 2 \W*, W 2 ) (17) 

A.E 2 ,W 2 eWi(E 2 ) 1 ,D 2 : 
Ae*+{1-A)B 2 <B AL> 1 +(1-A)D 2 <D 

holds. We also have 

R a (D\E)< min I(P X ,V). (18) 

V€V(E,D) 

Proof: See Section [V] ■ 
Remark 9: Note that (Qj]i is obtained from ([T6]l by letting E x = E* and Wi = W*. Thus, (O is tighter than 
(1171 1. However, we cannot give a single letter expression for the right hand side of ( TToT i. while we can for (fTTT i by 
using Proposition [3] 

Remark 10: A close inspection of the proof reveals that we can generalize (TT~6b by considering one-to-m lossy 
source coding with side information at the decoders. That is, in the same manner as (1161 1. we can show that 

R a {D\E)> max mmR HB {D 1 ,D 2 ,...,D m \W 1 ,W 2 ,...,W m ), (19) 

\,E.W D 

where (i) max is taken over all A = (Ai,...,A m ), E = (Ei, . . . , E m ), and W = (Wi, . . . , W m ) such that 
J2j x j = !. Ej h E o ^ S - and e Wi(£y) (j = 1, • • • , rri) and (ii) min is taken over all D = {D u D m ) 
such that J2j ^jDj < D. The authors conjecture that the bound (TT9l is not tighter than ( fToT ). i.e., is equivalent to 
©. 

Remark 11: The upper bound in ( TT8l is derived by using the side-information only for the estimation at the 
decoder and not for the bin coding, which is the difference between dTZb and ( fl~8b . 

From Theorem [8] we have several corollaries. At first, let us set parameters in ([P7t as A = and E 2 = E. Then, 
we have 

R a {D\E) > max min Rhb{D u D 2 \W*, W 2 ) 

W2eWi(£)D2<D 

= max RHB(d ma x,D\W*, W 2 ). 
w 2 ew 1 (E) 

Note that RHB{D ma _ x , D\W*, W 2 ) equals to the Wyner-Ziv rate-distortion function Rwz(D\W 2 ). This fact gives 
the following corollary. 
Corollary 12: We have 

Ra(D\E) > max R WZ (D\W). 

W£Wi{E) 

Next, let us consider the lossless case. Note that Rhb(0, 0|W*, W 2 ) equals to the minimum coding rate such 
that the decoder ijj^ 31 without side information can reproduce X n in losslessly. Thus, for any side information 
channel W 2 , 

Rhb(0,0\W*,W 2 ) =H(X), 
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Since R a (Q\E) < H(X), we have the following corollary. 
Corollary 13: For D = and E > 0, we havJ3 

R a (0\E) = H(X). (20) 

This corollary indicates that the side information is completely useless when D = and E > 0. It should be 
emphasized that Corollary [T2] does not give Corollary Qj] in general. This means that our result (IPTi is tighter than 
Corollary [12] 

Lastly, we show that our bound ( fTTI i gives another trivial bound. Assume that E > E*. Then, we can set A = 1 
in ( TTTb and have 

i? a (£>|£) > max min Rhb{D u D 2 \W*, W 2 ) 

W 2 £W 1 (E 2 ) Di<D 

= max RHB(D,d ma .x\W*, W 2 ). 
w 2 eWi{E 2 ) 

Furthermore, for any side information channel W 2 , it is apparent that 

Rhb (D, d max \W* , W 2 ) >R(D) 

where R(D) is the rate-distortion function for one-to-one lossy coding without side information. Hence, if E > E*, 
we have 

R a (D\E) > R(D). 

Since R a (D\E) < R(D) always holds, we have the following corollary. 
Corollary 14: If E > E*, then we have 

R a (D\E) = R(D). 

C. Binary Hamming Example 

To provide some insight on our results, we consider the binary Hamming example, i.e., we assume that X = 
y = X = {0, 1}, P x (0) = Pjc(1) = |, and 

if x = y 



e(x,y) = 
d(x,x) 



1 else 

if x = x 

1 else 



In this section, we assume that E < i. 

We first consider the maximum distortion class. In this case, the set Wi (E) can be parametrized by two parameters 
(a, fi) satisfying 

a + B 
—— < E 
2 ~ 

'We need the condition E > because we need to take A > in 4 1 71 . 
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(see Fig. [3] and Fig. [4j. 

By the concavity of Rwz(D\W,E) with respect to W (Lemma |4]i and by the symmetry with respect to a and 
f3, we have 

argmax R WZ (D\W,E) = BSC(£). 
weWi(E) 

Let 0, 1 G W be constant functions that output or 1 irrespective of y and let y be the function that output 
y itself. Similarly, let y be the function that outputs y © 1. In the binary Hamming case, U = {0, l,y, y}. For 
= BSC(£), it is known that 

Rwz(D\W*)= min (f>(V,W*) 
vev{w,D) 

is achieved by the test channel of the form 

{A(l — q) if u = x 
Xq if u = x 8 1 

(1 - A) if tt = y 

for some < A < 1 and < q < | (A represents the time sharing). In this case, the distortion is given by 

A Px (x)V q (x\x)d(x , x) 

+(1- X)J2Px(x)W*(y\x)d(x,y) 
= A^Px(x)y g (x|x)d(x,x) + (l-A)S 
< D, 

where V q = BSC(q). Since every channel W £ Wi(E) satisfies 

Y,Px(x)W(y\x)d(x,y) 

= Y, P x^W(y\x)e(x,y) 

< E, 

we find that V G V(E, D). Thus, the matching condition of Corollary [6] is satisfied for this binary Hamming 
example. 

Next, we consider the average distortion class. We evaluate the upper bound (118t . We first fix W* to be BSC(£'). 
Note that 

min I(Px,V)> min I(P X ,V). (21) 

V£V(E,D) V£V{W',D) 

For a test channel V e V(W*,D), let V be a test channel such that V(u\x) = V(u © l\x © 1) for u e U 
and V(u\x) = V(u\x © 1) for u € {y, y}. Then, by the symmetry of the BSC and the source Px, we have 
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a 



Fig. 4. The set of all channels in Wi(E). 



V G V(W* , D) and I(Px, V) = I(Px, V). By the convexity of the mutual information for channel, we have 

I{Px,V)<\l{Px,V) + h{P x ,V), 

where V — + ^V. This means that the minimum in the right hand side of (fJTJ is achieved by a symmetric 
test channel, i.e., V(u\x) = V{u © \\x © 1) for u G U and = V(u|x © 1) for u G {y,y}- Furthermore, 

for E < i, we can assume that V{y\x) = because using y only makes the distortion larger. We also note that 
such a symmetric test channel satisfies V G V(E, D). Thus, the equality in (f2TT) actually holds. Consequently, the 
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upper bound on R a (D\E) in this example is the time sharing between the ordinary rate -distortion function and the 
distortion that can be achieved only by the estimation, i.e., the point (E, 0). 

D. Discussion on Universality 

In this section, we discuss on the definitions of the universal Wyner-Ziv coding. We also discuss the relation 
between the universal Wyner-Ziv coding and the Heegard-Berger problem. 

Let us consider the binary Hamming case as in the previous section. Let X be the uniform random variable on 
{0, 1}. Let W\ be the binary channel in Fig. [3]with a = 2E and j3 = 0, and let W2 be the binary channel in Fig. [3] 
with a = and (5 = 2E. Obviously, the Wyner-Ziv rate-distortion functions for Wi and Wi coincide, i.e., 

Rwz(D\W 2 ) = Rwz{D\W 2 ). 

It should be also noted that W\{E) is the convex hull of the set {Wi, W 2 }. 

As we have mentioned in Section IH-BI in the ordinary definition of the universality, we require that there 
exists a universal code that works well for every Wwz(R, D) instead of Wi(E). If we set R = Rvv z {D\W\) ~ 
Rwz(D\W 2 ), then we have 

W u W 2 e W WZ (R,D). 

Thus, at least, we have to construct a code that is universal for both W\ and W 2 , which can be regarded as a 
special case of the Heegard-Berger problem lTT2l . The rate-distortion function Rhb(D, D\W\, W 2 ) is not known, 
but we have a trivial lower bound 

Rhb (D, D I W\ , W 2 ) (22) 

> R W z(D\Wi) = Rwz(D\W 2 ). (23) 

The equality in (l23l is a required condition such that the universal coding in the sense of Wwz(R,D) to be 
possible. In other word, if the strict inequality holds in (l23l , this means that the universal coding in the sense of 
Wwz(R, D) is impossible. Showing whether the equality holds or not is an important open problem. 

A straightforward upper bound on Rhb(D, D\Wi,W 2 ) can be derived as follows. Let V s G V(JA\X) be a 
symmetric test channel such that 

V.(0\0) = V s (l\l), 
V s (y\0) - V s (y\l), 
V a {y\0) = V s (y\l) = 0. 
Then, by taking V^(0|0) appropriately, we have 

V s e V(W 1 ,D)nV(W 2 ,D). 

The achievability of 

<t>(v.,Wi) = <Kv B ,w a ) 
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Fig. 5. Comparison among Rhb{D, D\Py\x> Pz\x)> Rwz{D\Py\x)> an d Rm{D\E). The red solid line is Rwz{D\Py\x)- The green 
dashed line is Rub (D, D\P y \x > Pz\x )■ The blue dashed line is R m (D\E). 



can be also derived from the known upper bound in H121 . Thus, we have 

R HB {D,D\W U W 2 ) 

< Rhb(D,D\W u W 2 ) 

:= min <t>(V at Wi) 

Viev(Wi,£>)nv(W2,z») 

Numerical calculations of Rhb(D, D\Wi, W2) and Rwz(D\Wi) are compared in Fig. For comparison, we also 
plotted R m {D\E) in the figure. As we can find from the figure, R m {D\E) is much larger than Rhb{D, D\W\, W2). 
This is because Wi(E) involves BSC(£'). 

IV. Proof of Theorem[5] 

A. Proof of Converse Part 

For any S > 0, let W e Wi(E - S). Then, from the definition of W m (E), we have {W xn }™ =1 E W m (E), 
which implies 

Rm(D\E)> max R WZ (D\W). 
weWi(E-s) 



Febraary 4, 2013 



DRAFT 



JOURNAL OF LSTpX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 



16 



Since this inequality holds for arbitrary 5 > 0, we have fllOt . ■ 
B. Proof of Direct Part 

Note that the function </>(•, W) is a convex function for fixed W, (j>{V,-) is a concave function for fixed V, 
and W\(E) and V(E,D) are convex sets. Thus, dT~3T > is derived from (TlZt by applying the saddle point theorem 
l27l . We prove (fT2l i by three steps. First, we prove that there exists a universal code for i.i.d. channels. Then, we 
show that there exists a randomized universal code for permutation invariant channels. Finally, we de-randomize 
the randomized universal code by using the technique of ll28l . |29l . 

1) Code for i.i.d. Channel: In this section, we construct a universal Wyner-Ziv code for a fixed test channel 
such that it works well for every W G Wi(E) P\V n (y\X). We construct a universal Wyner-Ziv code by using the 
output statistics of random binning argument recently introduced by l30l . We note that a universal Wyner-Ziv code 
can be also constructed from the coding method in OTI . 

Let us fix V G V(D, E). We use two kinds of bin codings /„ : U n S n and g n : U n — > £„. Let F n and G n 
be random bin codings. For arbitrary small 6 > 0, let Rf, R g > be the real numbers such that 

R f = H(U\X)-S, (24) 

R q = max 4><V,W) + 2S (25) 
weWi(.E) 

max I(U;X\Y)+26. (26) 

weWi(E) 

Since 

I(U;X\Y) = H(U\Y)-H(U\X,Y) 
= H(U\Y)-H(U\X), 

we have 

R f +R„= max H(U\Y) + 5. (27) 

weWi(E) 

Let |<S n | - [2 nR fj and \C n \ = \2 nR s]. 

From d27] i. we find that the sum rate Rf + R g is sufficiently large for the Slepian-Wolf coding. We use the 
following lemma on universal Slepian-Wolf coding. 

Lemma 15: For sufficiently large n, there exists /ii > and a universal decoder K n : y n x S n x C n — >• such 
that 

Ef„G„ [Perron, Gn.W)] < 2^ in 

for every W G Wj^-E 1 ) n7'„(y|Af), where P err (F n ,G n , W) is the error probability of the Slepian-Wolf coding for 
channel W when the bin codings (F n ,G n ) are used. 
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Proof: The lemma is proved exactly in the same manner as A few modifications are that we use the 
random bin coding instead of the random linear coding^, and that we evaluate the ensemble average of the error 
probability. ■ 

From (l24l i. we find that the rate Rj is sufficiently small to generate the uniform random variable that is independent 
of X n . We use the privacy amplification lemma (Lemma l27li described in Appendix iBl 

We construct a code as follows. Let 



Pa, 

be the distribution describing the Slepian-Wolf decoder. Let 

= Ps n x^s n ,x n )P un \s n x^{u n \s n ,x n )P Ln \ un {t n \u n ) 
P Y n ]X n(y n \x n )P an{YnSnLn (u n \y n ,s n ,£ n ) 

and 

Pg n L n l/nx n Y rl U rl ( Sn >^' n > U ,X >y ' ^ ) 

= Ps n ( Sn )P Xn (x n )P UnlSnX n (u n \ Sn , X n )P Ln]U n (£ n \u n ) 

The distribution P s L unX n Y -n,u n describes a virtual coding scheme in which the encoder sends both F n (U n ) 
and G n {U n ). The distribution Pg L unXnY „(j n describes a real coding scheme in which the encoder sends only 
G n {U n ) and uses the common randomness S n that is shared with the decoder. Note that Pun\s nX n(v, n \s n: x n ) is 
a randomized quantizer, which is derived from the bin coding /„ and the test channel Pjj\ x via Pu n x n S n - From 
Lemma [27] and the fact that the variational distance does not increase by data processing or marginalization, we 



have 



E 



F n G„ 



\^S n U n X n Y n U n ^S n U n X n Y n U n \ 



for some /12 > 0. By the large deviation bound such as the Bernstein inequality, there exists ^3 > such that 

Pu«X"Yn({d n (x n ,u n (y n )) >D + 5})< 2-^ n . (28) 
It should be noted that the bound fl28l is uniform with respect to the channel W. By Lemma [T3J we have 

^F nGn [P SnU n X n Y nf;n({d n (x n ,U n (y n )) > D + 6})] 

< E FnGn [P SnUnxnYn&n ({d n (x n ,u n (y n )) >D + S 
or u n + u n })\ 

2 we can also use the random linear coding instead of the random bin coding because Lemma l27l holds under the condition that f n is chosen 
from a universal hash family and the random linear coding ensemble is a universal hash family. 
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Since 

— \\Ps n U n X n Y"U" — ^ > S„U"X n Y^U" II 

for any set A, we have 

E FnG jP^ UnXnYn0n ({d n (x n ,u n (y n )) >D + 6})} 
^ 3 2~ n m ' n ^* 

Since 

\Wi(E)nv n (y\x)\ < (7i + i)i^ii y i, 

there exists at least one realization (/„, g n , s n ) of (F n , G n , S n ) such that 

P unX nY^\sS{dn{x n ,u n {y n )) >D + 6}\s n ) 
< 3(n+ i)l^ll^l2-" min w. 

for every W G Wi(E)nP n (y\ X). Furthermore, let K n be a random variable that simulate the randomized quantizer 
Pu n \S n x n - Then, we can also eliminate this randomness in a similar manner as above. 
In summary, we have shown the following. 

Lemma 16: For any V G V(E,D) and any 5 > 0, there exists a universal code (tp n ,ip n ) and a constant \i > 
such that 

-logLMJ< max <j)(V,W) + 26 
n VKeWi(B) 

and 

PT{d n (X n M<P n (X n ),Y n )) >D + S} <2-^ n 

for every W G Wi(£') (~l 'P n (3 ; |A') provided that n is sufficiently large. 

2) Code for Permutation Invariant Channel: In Section II V-B 1 1 we constructed a universal Wyner-Ziv code 
((fin, 4>n) for a fixed test channel such that it works well for every W G Wi(E) n P n {y\X). In this section, we 
use this code to the channel in W m (E). Let 7r„ be random permutation on {1, . . . , n}. We first apply the random 
permutation to the sequence (X n , Y n ) and then use (ip n , ip n ). It should be noted that the encoder and the decoder 
agree with a realization of the random permutation in this section. We denote 

PI ■ W n {x n ,y n ) = P$(x n )W n (y n \x n ). 

Note that 

[PZ-W n (ir n (x n ),n n (y n ))} 
= E nn {P%(ir n (x n )W n (ir n (y n )\n n (x n ))} 
= P%(x n )E nn [W n (7r n (y n )\-K n (x n ))} , 
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and we consider the average performance with respect to the permutation. Thus, without loss of generality, we can 
assume that W n is permutation invariant, i.e., W n (y n \x n ) = W n (y n \x n ) if P x ™ y ™ = P^y^- 
Lemma 17: Let An C X n x y n . Suppose that 

P" . W xn (A c n ) < e 

for every W G Wi(E + <5e max ) n V n {y\X). Then, for any conditional type W € W n (T X j, E), we have 

P x (x n n[W€V n (y\p x n)} 

< (»+l)l*ll y le. 

Proof: For any G V n (y\P x «), note that 

W x "(T^(a;")|a ; ") > L [ —2-* D W\*\ p -») (29) 

(n + I)'' 1 >> y > 



1 



(30) 



(n + l)i™ - 
From d45l l. we have 

max / • 

Thus, for every W G VV„(Tx,5,-B) we have 

e > p« . W Xn (^) 

i"eT x ,j 9 »e%(i") 

xT^x"(r^(x")|x") ) G 
\ 1 w\ x )\ 

x E ]ri)i 1[(a; " yn)6 ^ ] ' 

which implies the statement of the lemma. ■ 
Lemma 18: Suppose that the code (ipni^n) satisfies 

Pi-{d n (X n ,MMX n ),Y n )) >D + 5}<e 

for every i.i.d. channel W xn such that W £ W t (E + Se max ) n V n {y\X), where (X n ,Y n ) ~ P£ ■ W xn . Then, 
we have 

P T {d n (X n ,MMX n ),Y n )) >D + 6} 
< (n + l) 2 WI%+P^((T« 5 ) c )+5 1 
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for every permutation invariant (not necessarily i.i.d.) W n satisfying 

P r {e n (X n ,Y n )> E}<S 1 , 

where (X n , Y n ) ~ Pg ■ W n . 

Proof: Suppose that (X n ,Y n ) ~ P™ • W n . By using Lemma [TT] for 



A n :={(x n ,y n ) :d n (x n ,ip n ((p n {x n ),y n )) < D + 5} 



we have 



Pr {d n {<p n (X n ),Y n ))>D + S} 

< E F x(^ n ) 

+ E E pz(x n mw€v n (y\p x n)} 

WeW„(T x ,s,E) oc n eT x ,5 

x E W n (T w (x n )\x«) 1 l[(x n , y n ) 6 A c n ] 

+ E E P^ n mwev n (y\p xn )]w n (T w (x n )\x n ) 

+ E E Px(^ n nw g v n (y\p xn )] 

WeW n (T Xt5 ,E) x n eT X}S 

+ Pv{e n (X n ,Y n )>E} 

< (n + l^^s + P^T^ + S,, 

where we used W n {T-g^{x n )\x n ) < 1 to bound the second term, we used the fact 

W £W n (P xn ,E) e(P*»,W) > E 

and d43l to bound the third term in the second inequality, and we used Lemma [17] to bound the second term in the 
third inequality. ■ 
By combining Lemma [T6l and Lemma [T"8l and by noting the definition of W m {E), we have the following. 

Lemma 19: For any V € V(E + £e max , D), any 6 > 0, and any e > 0, there exists a universal code (ip n , ip n ) 
such that 

-logLM„|< max 6(V,W)+2S 

n WeWi(B+«e mai ) 

and 

E„„ [Pr{d„(7r„(X") ) ^„(^„(7r„(X")),7r„(F n ))) >D + 5}] 

< £ (31) 
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for every W G W m (E) provided that n is sufficiently large. 

3) De-Randomization: Now we reduce the size of random permutation by using the de-randomization technique. 

Lemma 20: Suppose that (ip n ,ip n ) satisfies d3lV Then, for arbitrary 62, 7 > 0, there exists m n — 2 S2n permu- 
tations {tt'iP, . . . , 7r4 m "' ) } such that 

711 fc * 

>D + S}<e + j 



i—i 



provided that n is sufficiently large. 

Proof: For a permutation 7r„ and (x n ,y n ) S AT" x y rl , we denote 

I(n n ,x n ,y n ) 

= l[d n (7r n (x n ),Tp n (cp n (ir n (x n )),ir n (y n ))) > D + 6}. 

Let 7Tn , . . . , 7ri m "' > be randomly generated permutations, and let /(x n , y n ) = E 7rn [7(7r„, x™, y n )]. Then, by using 
Lemma [281 for Ai = I(TTn\x n ,y n ), b = 1, and a = we have 



< exp{-( 7 2 /4)m„}. 
Furthermore, by using the union bound, we have 



Pr L(x n , y n ) — £ I(^\x n , y n ) > I(x n , y n ) + 7} 
I m " i=i J 



< |A'"||y l |exp{-(774)m„}. (32) 

Since cxp{ — (7 2 /4)m„} converges to doubly exponentially, the right hand side of d32l is strictly smaller than 1 
if n is sufficiently large, which implies that there exists one realization of 7rJ 1 \ . . . ,7ri m "' ) such that 

-. rn„ 

-J2^\x n ,y n )< I{x n ,y n )+1 (33) 



i=l 



for every (x n ,y n ). Finally, by taking the average of both sides of ( T33l > with respect to (X n ,Y n ), we have the 
assertion of the lemma. ■ 
Finally, by combining Lemma [19] and Lemma |20l by taking the constants to be sufficiently small and n to be 
sufficiently large, we can show (fT2l . ■ 

V. Proof of Theorem[8] 

A. Proof of Converse Part 

We only prove < fT6l > because (fTTT i is obtained from ([Tol l by letting Ei = and W\ = W*. 
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Assume that R is achievable and fix A, E\, E 2 , Wi, and W 2 such that XEi + (1 - \)E 2 < E and Wj € Wj(Ej) 
for j = 1, 2. To prove (fTST l. it is sufficient to show that there exists a pair (D\, D 2 ) such that \D\ + (1 — A)£>2 < -D 
and R > R H b(Di,D 2 \Wi,W 2 ). 

To do this, we consider the compound channel 

w n = \w* n + (i - X)W 2 n . 

Note that W = {W n }^ =1 6 W a (E) since 

en{PhW n ) = J2 Pl{x n W n {y n \x n )e n {x n ,y n ) 

= \e n (P£,W* n ) + (1 - X)e n (P x , W* n ) 

< XE 1 + (1 - \)E 2 

< E. 

Hence, by the definition of the achievability of R, for arbitrary small e > and sufficiently large n, there exists a 
code (ip n , tpn) such that 

-log|7W„| < R + e 
n 

and 

P x (x n )W n (y n \x n )d n (x n ,^ n (<p n (x n ),y n )) <D + e. (34) 

x n ,y n 

Note that (l34l can be also written as 

D + e > A Y, Px^ n )W 1 xn (y n W l )dn(x n ,Mv>n(x n ) : y n )) (35) 

x n y n 

+ Px{xnw 2 xn (y n \x n )d n {x n M Vn {x n ) iy n )). (36) 

On the other hand, by using (<p n ,ip n ), we can construct a HB code (ip BB ,ip BB1 ,ip BB2 ) as 

ip% B1 (m,y?) = ^ n (m,y?) me^yfef, 
^ B2 (m,y%) = M™,V 2 ) m&M n ,y^ey n . 
Then, let (Di,D 2 ) be the pair of average distortion occurred by , ^„ , )> i.e., 

D i : = E ^(» n )^/ n (l/?k n )*.(« n »V'n Bj '(^ B (a n ),»7)) J =1,2. (37) 
By the definition of Rhb(Di, D 2 \Wi,W 2 ) and the construction of the code, we have 

R + e > Rhb(Di,D 2 \W 1 ,W 2 ). 
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Further, ( |36l l and ( |37| i indicate 

£> + £> AL»i + (l-A)D 2 - 

Since we can choose e arbitrary small, we have R > Rhb(Di, -D2IW1, W2) and \D\ + (1 — A)£>2 < D. ■ 
B. Proof of Direct Part 

As the direct part proof of Theorem we prove ([18} in three steps. First, we construct a code for i.i.d. channel. 
Then, it is used to permutation invariant channels by using random permutation. Then, the size of the randomness 
is reduced by the de-randomization technique. 

1) Code for i.i.d. channel: The goal of this section is to show the following lemma. 

Lemma 21: For arbitrarily fixed V <G V(E, D) and 8 > 0, there exists fi > and a code ip' n : X n — > U n such 
that 

hag\<p' n \<I(P x ,V) + 28 (38) 

and 

Vx\[U n ,X n ,Y n )^T PxVW ^<2-^ 

for every W £ V n (y\X) provided that n is sufficiently large, where \ip' n \ is the cardinality of the image of 
ip' n , U n = ip' n (X n ) and T PxV ^ 5 is the set of P^xy-typical sequences with respect to Puxy(u, x, y) — 
P x {x)V{u\x)W{y\x). 

Proof: We construct a code in a similar manner as Section IIV-B 1 1 We use two kinds of bin codings /„ : U n — > 
S n and <7„ : U n — > C n . We set 

R f = H(U\X)-5, 
R g = I(P x ,V) + 28. 

Let \S n \ = [2 nR f\ and |£„| = [2"^]. 
Since 

R f + R g = H{U) + 8, 
there exists a decoder K n : S n x C n — > W" and fii > such that 

Ef„G„ [Perr(F n , G n )} < 2~^ n (39) 

for sufficiently large n, where P err (F n ,G n ) is the error probability of the source coding when the bin codings 
(F n , G n ) are used. Furthermore, since Rf = H(U\X) — 8, S n = F n (U n ) is close to the uniform random variable 
that is independent of X n (Lemma |27]|. 
We construct a code as follows. Let 
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be the distribution describing the decoder. Let 

- P Sn x n {s n ,x n )Pun\ SnXn {u n \s n ,x n )P LnW n{l n ^ 

and 

= Ps^n)Px<X n )P Un \ SnXn {u n \ Sn ,X n )P LnW ^l n ^ 

Note that i"£/™|S„x>» is a randomized quantizer. From Lemma [27] and the fact that the variational distance does not 
increase by data processing and marginalization, we have 



E 



F n G„ 



for some /i2 > 0. By Lemma [26] and d39l ), we have 



E 



F n G„ 



P, 



S n U"X n Y n U 



ni r n ({(u n ,x n ,y n )<tT PxVWiS }) 



P, 



S n U n X™Y™U 



i T PxVWtS or «» ^ 6"}) 



for some /13 > 0. Since 



Ps„u n x n Y n u n \*ty ^ > s„u n X n Y n fr n \^-) 



< \\P 



S n U n X"Y"U" 



for any set A, we have 



3 



F„G n 



P- 

r S n U n X"Y™U" 



„({(u n ,x n ,y n )tT PxVWtS })\ <2"^» 

for some /U4 > 0. Since the cardinality of V n {y\X) is bounded by (n + 1)1-^11^1, there exists one realization 

(fn,9n,s n ) of (F n ,G n ,S n ) satisfying 



P 



U n X n Y"U"\S. 



§ ({(&", z n ,y n ) £ T Px y^}| s „) < (71 + 1)^1^2-^ 



Furthermore, let K n be a random variable that simulate the randomized quantizer Pu*>\s n x n - Then, we can also 
eliminate this randomness in a similar manner. Let r„ : S n x X n — > U n be the resulting deterministic quantizer. 
Then, we set ip' n (x n ) = k„(s„, g n ( T n(s n , x n ))). The image size of <p' n obviously satisfies d38l ). Thus, by taking n 
sufficiently large, we have the assertion of the lemma. ■ 
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2) Code for Permutation Invariant Channel: 

Lemma 22: For W G W n (T x ,s), let A n {W) C X n x y\ Suppose that 

P x ■ W xn (An(W) c ) < e 
for every W G W n {Tx,s)- Then, for any conditional type W G W n (Tx,s), we have 

£ P£(*")1[W G 

x E j^W^" 2 ^ e A( ^ r] 



Proof: We prove this lemma in a similar manner as Lemma [TT] For any G V n (y\Px n ), note that d30l ) holds. 
Then, for any W G W n (T x ,s), we have 

e > P x -W xn {A n (W) c ) 

> E ^(^l^GPn^n)] 2 W* n (T w (x n )\x n ) - l[(x n ,y n )eA n (Wy] 



> 



£ p x (x n )i[w g v n (y\p x n )] E ^Li ii^.^e^o^i, 



fn+ 1)1*11*1 ^ -.n\^,x/j ^ 



which implies the statement of the lemma. 

Lemma 23: For a given V G V(E + 2<5e max , D), suppose that there exists : X n —¥ U n such that 

Pr{(U n ,X n ,Y"-)£T PxV ^ s }<e 

for every W G V n {y\X), where U n = ip' n (X n ) and T PxV yy & is the set of all Py^r-typical set for Puxy{u, x, y) 
Px(x)V(u\x)W(y\x). Then, we have 

E \d n (X n ,U n (Y n )) 

< {Px{T c x,s) + (n + lf xim e + s} d max + D 

for every permutation invariant (not necessarily i.i.d.) W n such that 

E[e n (X n ,Y n )]<E 

provided that n is sufficiently large. 

Proof: From ( f46b . we first note that 

d n (x n ,u n (y n )) < d(V,W) + 6d max 

for (u n ,x n ,y n ) G T P V yy $• Then, by using Lemma l22l for 

A n {w) = {(x n , y n ) : (<p' n (x n U n ,y n ) e r Px ™ 5 }, 
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we have 

E \d n (X n ,U n (Y n )) 



< 



+ E E n^nm &v n {y\p xn )\ 

J2 W n (T w (x n )\x n ) 1 l[(x n ,y n ) G A^W)]d max 
+ E E ^(^)i[^e^(^|P,")] 

]T VK"(r^(x")|x") ) M , ![(*", y") e AOTMW+Mna*} 

{Px(T x ,s) + (« + lfWMe + s] d max 

+ E E P x {x n )W n {T w {x n )\x n )d{V,W\ 

x n eT x ,5 Wev n (y\p x n) 

where we used Lemma [22] to upper bound the second term in the second inequality. Now, we rewrite the last term 

as 

]T ]T P x (x n )W n (T w (x n )\x n )d(V,W) 

i"eT Xij wep n (y\p x «.) 

= P x {T x>s )d{V,W mix ), 

where W m ix G V(y\X) is a channel defined by 

W mix {y\x)= p x^ n ) E W n (T w (x n )\x n )W(y\x) 

x n £T X:S wev n (y\p x ™) 

for 

Pxix )- p n {Txs y 

From (f43b and d44t . we have 

e n (x n ,y n ) >e(P x ,W)-6e max 
for x™ G T x ,s and G T^(x"). Thus, we have 
E > E[e n {X n ,Y n )] 

= E Px{x n W n {y n \x n )e n {x n ,y n ) 

x n y n 

> E p x( xn ) E ^ n (%(z n )|z")e(P x »,W) 

> E P *(^) E W n {T w {x n )\x n ){e{P x ,W)-8e m ^] 
= P x (T x>s ){e(P x , W mx ) - <5e max }. 
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Thus, we have W m i X £ Wi(E + 2<5e max ) provided that n is sufficiently large. Since V G V(E + 2<5e max , D), we 
have d(V, W m i X ) < D. This completes the proof. ■ 
By combining Lemma [2T1 and Lemma l23l we have the following. 

Lemma 24: For any V G V{E + 2(5e max , D), 8 > 0, and e > 0, there exists <p' n : X" U n such that 

1 



\og\ip' n \<I{P x ,V) + 5 
n 



and 



E 



d n (n n (X n ),U n (ir n (Y n ))) 



<D + e (40) 



for every W G W m (E) provided that n is sufficiently large, where U n = ip' n (ir n (X n )). 

3) De-Randomization: Now we reduce the size of random permutation by using the de-randomization technique. 

Lemma 25: Suppose that ip' n satisfies d40l l, Then, for arbitrary ^2,7 > 0, there exists m„ — 2 &2n permutations 
{-Kn , ■ ■ ■ , Kn nn) } such that 

-I Tn n 

— k(7r„(X n ),L>"(7T„(y")))" 

Tin L 



m n - 
i—i 



<D + e + j 



provided that n is sufficiently large. 

Proof: For a permutation ir n and (x n ,y n ) G X n x y n , we denote 

J(tt„, x n , y n ) = d n (ir n (x n ), u n {i: n {y n ))), 

where u n = Lp' n {-K n {x n )). Let irn \ . . . , Tri" 1 ™- 1 be randomly generated permutations, and let J{x n ,y n ) = [J(7r„, x™, J/™)]. 
Then, by using Lemma |281 for Ai = J(irn\x n , y n ), b = c? max , and a = ^-fl — , we have 



(- 1 m n 

Pr — 5^J(7rW,x",i/ n )> J(x n ,y n ) 



< cxp{-(7 2 /4< ax )m„}. 
Furthermore, by using the union bound, we have 

Pr { 3(x n ,y n ) — £ J(^,x n , y n ) > J(x n ,y n ) + 7 } 
I m ™ i=i J 

< |A-"||y l | C xp{-( 7 2 /4dL x )^n}- (41) 
Since cxp{ — (7 2 /4e? 2 nax )m n } converges to doubly exponentially, the righthand side of ( |4TT > is strictly smaller 



than 1 if n is sufficiently large, which implies that there exists one realization of 71"^, . . . ,7ri, m "' such that 



— J2 , x n , y n ) < J(x n ,y n ) + 7 

rn „ L — * 



2—1 



(42) 



for every (x n , y n ). Finally, by taking the average over both sides of (142) with respect to (X n ,Y n ), we have the 
assertion of the lemma. ■ 
Finally, by combining Lemma [24] and Lemma [25] and by taking the constants to be sufficiently small and n to 
be sufficiently large, we can show that the righthand side of ([T8l ) is achievable. ■ 
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VI. Conclusion 

In this paper, we introduced the novel rate-distortion functions for the Wyner-Ziv problem, which are defined as 
the minimum rates required for the universal coding for the distortion constrained general channel classes. Then, 
we derived the upper bounds and lower bounds on the rate-distortion functions. The complete solution for the 
rate-distortion functions is remained open. Parts of difficulties are related to the Heegard-Berger problem, which is 
also a long-standing open problem. 

Appendix 

A. Miscellaneous Facts on Types and Typicality 

In this section, we introduce some notations and known facts on the type method |32| . 

The type of a sequence x n and the joint type of (x n ,y n ) are denoted by P x n and P x n y n respectively. The set 
of all types and joint types are denoted by V n (X) and V n (X x y). For type P, the set of all sequence such that 
P x n = P is denoted by Tp. We use a similar notation for joint types. The set of all conditional types is denoted 
by V n {y\X), and the set of W-shell for given x n is denoted by T w (x n ). For type P G V n {X), the set of all 
conditional types such that Tw(x n ) is not empty is denoted by P n (y\P). It is well known that 

\V n (X)\ < (n + l)l*l, 
\V n (Xxy)\ < (n + 1)™, 
\V n (y\X)\ < (n+1)™, 

and these inequalities are extensively used in the paper. 

For Px G V(X), a sequence x n is called -typical sequence with constant S if 

|P x »(a) - P x (a)\ <SVaeX 

and no a G X with Px(a) = occurs in x n . The set of all typical sequence is denoted by Tx,s- The set of 
all types P G V n (X) such that Tp C Tx.s is denoted by Vx,s,n- For joint probability distribution, joint typical 
sequence and the set of all joint typical sequences are defined in a similar manner. It is well known that the set of 
all non-typical sequences occur with exponential small probability. Especially for our purpose, we need a bound 
such that the convergence is uniform with respect to P x . 
Lemma 26: For any Px G V{X), we have 

P x (Tl s )<2\X\2- n ^. 

Proof: For each a G X such that Px (a) > 0, by noting that the variance of 1 [Xi = a] — Px (a) is bounded 
by \ and 1 1 [Xi = a] — P x (a) | < 1 with probability one, and by using the Bernstein inequality, we have 

Pv{\P X n(a) - P x (a)\ >S}<2- 2~ n r& 

for any < S < 1. Thus, by using the union bound with respect to a G X, we have the assertion. ■ 
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Since the distortion is additive, the distortion between x n and y n only depends on their joint type, and thus we 
have 

e n (x n ,y n )=e(P,W) (43) 
if x n £ Tp and y n £ Tw{x n ). From the definition of Vx,s,n> we have 

\e(P,W)-e(P x ,W)\ <6e 

max 

(44) 

for any P £ Vx.s.n and W £ T{y\X). 
For P £ V„{X), let 

W n {P,E) :=W 1 (P,E)nV n (y\P). 

Then, for P £ Vx,5,n, fill) implies 

W £ W n (P, E) W £ Wi(E + 5e max ). (45) 

We also use the notation 

W n (T X)5 ,P) := (J W n (P,P), 

W»(T Xl *) := |J P«(y|P). 

For (V,W) G P(Z/|*) x 7?(y|Af), let P UXY (u,x,y) = P x (x)V (y\x)W (y\x) . For (u",i n , S ») £ T UXY ,8, by 
the same reason as ( |43l and (l44l . we have 

K(x",u"(y")) - d(V,W0| < Sd max . (46) 

B. Privacy Amplification Lemma 

Lemma 27: Let F n be the random binning from U n to 6>„ such that \S n \ = [2 nR f \, where Rj = H(U\X) — S. 
Then, there exists fi 2 > such that 

Ef„ [||Ps„x»-Ps„ xPxn\\] <2~^ n , 

where Pg^ is the uniform distribution on S n . 

Proof: The lemma is a straightforward consequence of 11331 (51)], which states that 

E [\\P Sn xn - P Sn x Pjp.ll] < 3\S n \ e 2 n ^ p »^ (47) 

for < 9 < i where 



r(0|Pax) = log^Px^) [J2 P u\x( 



ulx) 1 -" 



Since _-£gk»d 



= —H(U\X), there exists # > such that 

0=0 

T -^^<-H(U\X) + S -. 
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Thus, we have 



^log\S n \+T(6 \P ux ) < R f -H(U\X)+ S - 
n 2 



5 

2 (48) 



Combining ( f4Tb and d48l ). we have the assertion of the lemma. 



C. Bernstein's Trick 

Lemma 28 ([28]): Let A\, . . . ,A m be a sequence of discrete independent random variables that take values in 
[-b, b}. Then, for < a < min[l, ^e~ 2b ], we have 

Pr |~£^ -E[^i]) >t| <exp{(-a 7 + a 2 6 2 )m} . 
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